{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<style>\n",
    "pre {\n",
    " white-space: pre-wrap !important;\n",
    "}\n",
    ".table-striped > tbody > tr:nth-of-type(odd) {\n",
    "    background-color: #f9f9f9;\n",
    "}\n",
    ".table-striped > tbody > tr:nth-of-type(even) {\n",
    "    background-color: white;\n",
    "}\n",
    ".table-striped td, .table-striped th, .table-striped tr {\n",
    "    border: 1px solid black;\n",
    "    border-collapse: collapse;\n",
    "    margin: 1em 2em;\n",
    "}\n",
    ".rendered_html td, .rendered_html th {\n",
    "    text-align: left;\n",
    "    vertical-align: middle;\n",
    "    padding: 4px;\n",
    "}\n",
    "</style>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Caching\n",
    "\n",
    "Vaex can cache task results, such as aggregations, or the internal hashmaps used for `groupby` operations to make recurring calculations much faster, at the cost of calculating cache keys and storing/retrieving the cached values.\n",
    "\n",
    "Internally, Vaex calculates fingerprints (e.g. hashes of data, or file paths and mtimes) to create cache keys that are similar across processes, so that a restart of a process will most likely result in similar hash keys.\n",
    "\n",
    "[See configuration of the cache.](../conf.md#cache)\n",
    "\n",
    "Caches can be turned on globally like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import vaex\n",
    "df = vaex.datasets.titanic()\n",
    "vaex.cache.memory();  # cache on globally"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One can verify that the cache is turned on via:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vaex.cache.is_on()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The cache can be globally turned off again:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vaex.cache.off()\n",
    "vaex.cache.is_on()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The cache can also be turned on with a context manager, after which it will be turned off again. Here we use a disk cache. Disk cache is shared among processes, and is ideal for processes that restart, or when using Vaex in a web service with multiple workers. Consider the following example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "29.8811345124283\n"
     ]
    }
   ],
   "source": [
    "with vaex.cache.disk(clear=True):\n",
    "    print(df.age.mean())  # The very first time the mean is computed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# outside of the context manager, the cache is still off\n",
    "vaex.cache.is_on()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "29.8811345124283\n"
     ]
    }
   ],
   "source": [
    "with vaex.cache.disk():\n",
    "    print(df.age.mean())  # The second time the result is read from the cache"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vaex.cache.is_on()"
   ]
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "2b337e1aa502f5cea9a92c761ad75d3ab5045107ee3446fdbe7f873d4f1936e7"
  },
  "kernelspec": {
   "display_name": "Python 3.8.5 64-bit ('base': conda)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
